CHAPTER 8 Getting Your Data into the Computer 109

If you impute a date, just create a new column with the imputed date, because you

want to be cautious. Make sure to keep the original partial date for traceability.

Any date imputation should be consistent with the study protocol, and not bias the

results. Completely missing dates should be left blank, as statistical software

treats blank cells as missing data.

Because of the way most statistics programs store dates and times, they can easily

calculate intervals between any two points in time by simple subtraction. It is best

practices to store raw dates and times, and let the computer calculate the intervals

later (rather than calculate them yourself). For example, if you create variables for

date of birth (DOB) and a visit date (VisDt) in Excel, you can calculate an accurate

age at the time of the visit with this formula:

Age

VisDt

DOB

(

) /

.

365 25

Checking Your Entered Data for Errors

After you’ve entered all your data into the computer, there are a few things you

can do to check for errors:»

» Examine the smallest and largest values in numerical data: Have the

software show you the smallest and largest values for each numerical variable.

This check can often catch decimal-point errors (such as a hemoglobin value of

125 g/dL instead of 12.5 g/dL) or transposition errors (for example, a weight of

517 pounds instead of 157 pounds).»

» Sort the values of variables: If your program can show you a sorted list of all

the values for a variable, that’s even better — it often shows misclassified

categories as well as numerical outliers.»

» Search for blanks and commas: You can have Excel search for blanks

in category values that shouldn’t have blanks, or for commas in numeric

variables. Make sure the “Match entire cell contents” option is deselected in

the Find and Replace dialog box (you may have to click the Options button to

see the check box). This operation can also be done using statistical software.

Be wary if there a large number of missing values, because this could indicate

a data collection problem.»

» Tabulate categorical variables: You can have your statistics program tabulate

each categorical variable (showing you the frequency each different category

occurred in your data). This check usually finds misclassified categories. Note

that blanks and special characters in character variables may cause incorrect

results when querying, which is why it is important to do this check.